DAO-CP: Data-Adaptive Online CP decomposition for tensor stream

How can we accurately and efficiently decompose a tensor stream? Tensor decomposition is a crucial task in a wide range of applications and plays a significant role in latent feature extraction and estimation of unobserved entries of data. The problem of efficiently decomposing tensor streams has been of great interest because many real-world data dynamically change over time. However, existing methods for dynamic tensor decomposition sacrifice the accuracy too much, which limits their usages in practice. Moreover, the accuracy loss becomes even more serious when the tensor stream has an inconsistent temporal pattern since the current methods cannot adapt quickly to a sudden change in data. In this paper, we propose DAO-CP, an accurate and efficient online CP decomposition method which adapts to data changes. DAO-CP tracks local error norms of the tensor streams, detecting a change point of the error norms. It then chooses the best strategy depending on the degree of changes to balance the trade-off between speed and accuracy. Specifically, DAO-CP decides whether to (1) reuse the previous factor matrices for the fast running time or (2) discard them and restart the decomposition to increase the accuracy. Experimental results show that DAO-CP achieves the state-of-the-art accuracy without noticeable loss of speed compared to existing methods.


Introduction
On the other hand, dynamic tensor decomposition methods aim to incrementally and quickly analyze tensors; however, the accuracy of existing methods is not satisfactory. Indeed, current dynamic methods (1) bypass the factor update for temporal mode to improve the speed [14], or (2) decompose the current tensor slice using prior factor matrices [15,16]. Unfortunately, these methods suffer from poor accuracy when the tensor stream has an inconsistent temporal pattern because they cannot adapt quickly to abrupt changes in data [17,18].
In this paper, we propose Data-Adaptive Online CP decomposition (DAO-CP), an accurate and efficient tensor stream decomposition algorithm which adapts to data changes. The main ideas of DAO-CP are to (1) detect change points of "themes" in a tensor stream by tracking local error norms, and (2) re-decompose the tensor stream whenever a new theme is discovered. DAO-CP automatically decides whether to reuse the previous results of decomposition or to discard them depending on how much changes are detected in the tensor stream. Consequently, it provides much more accurate and fast decomposition for real-world datasets even with inconsistent temporal patterns. Furthermore, we introduce complementary matrices in order to reduce the redundant computations in CP-ALS optimization. We also simplify the estimation loss function from DTD [15] by fixing the non-temporal modes. As a result, DAO-CP achieves lower time complexity than the existing methods without accuracy loss. Through experiments, we show that DAO-CP outperforms the current state-of-the-art algorithms in terms of accuracy with little sacrifice in running time. We also investigate the sensitivity and the effect of hyperparameters of our proposed method.
The main contributions are summarized as follows: • Method. We propose DAO-CP, an accurate and efficient online method for tensor stream decomposition.
• Analysis. We theoretically analyze the computational complexity of DAO-CP and compare it to existing methods.
• Experiments. DAO-CP shows the state-of-the-art accuracy on both synthetic and realworld datasets without significant loss of speed (see Fig 1).

Fig 1. Accuracy comparison between DAO-CP (proposed) and its competitors on Airport Hall (upper) and Sample Video (lower) datasets.
DAO-CP automatically detects a change of theme (for example, an object starts moving or a scene changes) and re-decomposes the data depending on the degree of changes. Note that DAO-CP results in much more clear images than the competitors with little sacrifice in speed. The code and datasets are available at https://github.com/snudatalab/DAO-CP. The rest of this paper is organized as follows. We first demonstrate preliminaries of tensor decomposition algorithms. We then present our proposed method in detail. After showing experimental results, we discuss related works, and conclude the paper.

Preliminaries
We describe preliminaries of tensors and tensor decomposition algorithms. Table 1 summarizes the symbols used in this paper.

Tensors
Tensors are multi-dimensional arrays that generalize vectors (1-order tensors) and matrices (2-order tensors) to higher orders. We denote vectors with bold lowercase letters (a), matrices with bold capital letters (A), and tensors with bold calligraphic letters (X). An N-th order tensor X has N modes whose lengths are I 1 , � � �, I N , respectively. A tensor can be unfolded or matricized along any of its modes [19], and the unfolded matrix of X along the n-th mode is denoted by X (n) . When a tensor is unfolded, its elements are reordered into a matrix form; the mode-n unfolding matrix X ðnÞ 2 R We define the Frobenius norm of a tensor using the notation k�k as follows: In what follows, we briefly define several important matrix products. The Kronecker product A � B of matrices A 2 R I�J and B 2 R K�L is a matrix of size IK × JL and defined as follows: : The Hadamard product A ⊛ B and Khatri-Rao product A � B are two essential matrix products used in tensor decomposition. The Hadamard product is simply the element-wise product of two matrices A and B of the same size. The Khatri-Rao product is a column-wise Kronecker product: where {a n } and {b n } denote the column vectors of A 2 R I A �J and B 2 R I B �J , respectively.

Tensor decomposition
CANDECOMP/PARAFAC (CP) decomposition is one of the most widely used methods for tensor decomposition, which is considered to be a key building block in many other variants [1,20]. CP decomposition factorizes a tensor into a sum of rank-one tensors: where the number R of rank-one tensor sets is called the rank of the resulting tensor. The factor matrices {A 1 , � � �, A N } refer to the combination of the vectors from the rank-one components, i.e., We express the CP decomposition result of a tensor X using Kruskal operator 〚�〛 and the unfolding matrix, where the Kruskal operator provides a shorthand notation for the sum of outer products of the columns in factor matrices [21]: Then, CP decomposition aims to find the factor matrices that minimize the estimation error L defined as follows: CP alternating least squares (CP-ALS) has been extensively used for this optimization problem. The main idea of ALS is to divide the original problem into N sub-problems, where each sub-problem corresponds to updating one factor matrix while keeping all the others fixed [20]:

Online tensor decomposition
Of particular interest in the problem of tensor decomposition is an efficient online algorithm for time-evolving tensors. We think of a tensor as a set of "slices" given at each time step. Given an N-order time-evolving tensor X 2 R I 1 �����I N , we expand it as a form of ½X old ; where X old 2 R I old 1 �����I N is the previous tensor data and X new 2 R I new 1 �����I N is a new tensor slice for one time step. Then, the goal is to efficiently decompose the tensor X given the previous decomposition result X old � 〚Ã 1 ; � � � ;Ã N 〛: where A ð0Þ 1 2 R I old 1 �R and A ð1Þ 1 2 R I new 1 �R . This is done by minimizing the estimation error L defined as follows:

Related works
Tensor stream decomposition is widely studied under CP decomposition [22,23]. Existing works employ one of the following two ideas: they update (1) only the non-temporal factors with precomputed auxiliary matrices [14], or (2) whole factors considering prior decomposition results [15,16]. We describe three main approaches (OnlineCP, SeekAndDestroy, and DTD) for dynamic tensor decomposition, and compare them with our proposed method.

OnlineCP
OnlineCP [14] preserves the previous temporal factor to efficiently decompose new tensor slices. After updates of non-temporal factors and the partial temporal factor, it simply appends a part of the temporal factor matrix to the previous matrix. OnlineCP avoids duplicated computations such as Khatri-Rao and Hadamard products by introducing auxiliary matrices. It computes complementary matrices before ALS iteration and yields a new decomposition. Despite its low computational cost, the approach cannot achieve an accurate decomposition due to the lack of consideration on the change of themes in data (see Fig 2). Note that DAO-CP solves this problem by tracking local error norms of the tensor stream and detecting a change point of themes, which enables an accurate decomposition even when the data have an inconsistent temporal pattern.

SeekAndDestroy
SeekAndDestroy [16] additionally uses rank estimation to discover latent concepts and detect concept drift in streaming tensors. The method estimates the rank of each incoming tensor slice, and updates the previous decomposition after alleviating concept drift. However, See-kAndDestroy requires extra computation due to the rank estimation for every time step, which causes a substantial loss of speed. Moreover, it consistently performs worse than Onli-neCP when the initial rank of OnlineCP is fine-tuned. Note that our proposed method efficiently detects the change of theme in streaming data because it does not require estimating the actual rank numbers, but only tracks local error in order to rapidly capture the change points.

DTD
DTD [15] was originally introduced as a part of MAST which is a low-rank tensor completion method to fill in the missing entries of the incomplete multi-aspect tensor stream. The method manages to reduce the time complexity by reusing the previous decomposition that approximates the tensor stacked until new slices come in. Specifically, for an N-th order tensor stream, DTD partitions the data into 2 N sub-tensors for each time step and uses binary tuples (i 1 , � � �, i N ) 2 Θ = {0, 1} N to denote the sub-tensors. Then, given that 〚Ã 1 ; � � � ;Ã N 〛 approximates X ð0;���;0Þ ≔X old , one can reformulate the estimation error L of online tensor decomposition as follows (see the notations from Preliminaries): where μ 2 [0, 1] is the forgetting factor which alleviates the influence of the previous decomposition error. Although DTD is an efficient method, it suffers from poor accuracy when an incoming tensor has an entirely different pattern compared to previous tensors, as it still tries to reuse the prior decomposition result. Our proposed method addresses the problem by using "re-decomposition" process and adapting quickly to sudden changes in data, and significantly increases the accuracy of decomposition.

Proposed method
We propose DAO-CP, an accurate and efficient online algorithm for tensor stream decomposition.
Overview DAO-CP is a time and memory efficient algorithm for accurate online CP-ALS tensor decomposition which adapts to data changes. The challenge of decomposing time-evolving tensors is to improve accuracy without sacrificing speed and memory usage. Considering that the themes of data change over time, we propose detecting the change points of themes and using different strategies depending on the degree of change. The main challenges are as follows: 1. Reduce computational cost. How can we reduce the arithmetic cost for updating decomposition factors of tensor streams?
2. Identify themes in data streams. How can we capture the latent themes in tensor streams and detect the change points of them?
3. Increase decomposition accuracy. How can we exploit the detected change points of themes and increase the decomposition accuracy?
To address the above challenges, we propose the following approaches.
1. Build an updatable framework for tensor stream. We use complementary matrices and previous decomposition results recursively, where the complementary matrices are updated only when there is a change in non-temporal factors, thus reducing the redundant operations.
2. Detect data changes by tracking error norms. We continuously track the error norms of incoming data slices in the tensor stream, detecting a sudden accuracy drop based on zscore analysis, which we regard as a change point of themes.
3. Re-decompose the tensor stream when a new theme is detected. Once a sudden change in theme is detected, we choose whether to refine or split the tensor stream depending on the degree of changes. We also introduce memory rate to improve the refinement process.
These techniques determine how much information from the previous decomposition should be retained, balancing the trade-off between accuracy and speed.

Update rules for DAO-CP
Let X ¼ ½X old ; X new � T be an N-order time-evolving tensor, where X old 2 R I old 1 �����I N is the previous tensor data and X new 2 R I new 1 �����I N is a new tensor slice; we assume that the first mode is the temporal mode. We design our update rules to efficiently decompose the tensor X � 〚A 1 ; � � � ; A N 〛, given the previous decomposition result X old � 〚Ã 1 ; � � � ;Ã N 〛. We partition the temporal factor matrix A 1 into old and new parts as where R is the decomposition rank. In order to consider the degree of change in themes, we introduce the memory rate ρ 2 [0.5, 1] which determines how much weight to assign to the decomposition of the previous tensor data. We define the estimation error L as a restricted form of the one from DTD [15], where the non-temporal modes of the tensor stream are fixed: The optimization of the estimation error L is based on CP-ALS [14,20]. Note that we simplify the estimation error from DTD by setting the changes in non-temporal modes to zeros because there is a change only in the temporal mode for our problem. The update rules to minimize L in (11) for each factor matrix are derived as follows: Note that we also update the prior temporal factors to further increase the accuracy of decomposition. If the previous temporal factors are not updated, they harm the accuracy of method whenever there is a change of theme because they are optimized only for the previous theme of the data. However, it is computationally demanding to directly apply these recursive processes.
To address the problem, we introduce two complementary matrices G and H, where G and H are updated only when there is a change in non-temporal factors, thus reducing the redundant computations. This leads to the following modified update rules: The overall update process is outlined in Algorithm 1.

Change points detection with local error norm
The key of our proposed method is to detect change points of themes in tensor streams and thereby adapt quickly to abrupt changes in data. To do this, we continuously track the decomposition error of tensor streams and detect a sudden accuracy drop which we regard as a change point of themes. Such an accuracy drop is captured by measuring local error norm E local for the new tensor slice and its decomposition result: We assume that E local follows a normal distribution E local � N ðm; s 2 Þ and keep track of its mean mðE local Þ and variance s 2 ðE local Þ to detect outliers. Note that we should update the mean and variance in an online manner. This is achievable by using Welford's algorithm [24,25], which provides accurate estimates of mean and variance without the necessity of keeping the entire data. Moreover, the method requires only one pass of given data in order to compute their sample mean and variance. Using Welford's algorithm, we detect outliers in the current local error norm by z-score analysis of the following criterion: where L is a threshold of anomaly. Changing the value of L, one can fine-tune the criterion on whether a new tensor slice is similar to the previous tensors or not.

Re-decomposition process
Once a sudden change of theme is detected by z-score analysis, DAO-CP exploits this information to increase the decomposition accuracy. As a new tensor slice is stacked for each time step, DAO-CP updates the factor matrices following the optimization scheme described in Algorithm 1. It then computes the z-score in the local error norm distribution and performs "re-decomposition" depending on the score. Tracking the distribution of local error norm and setting the z-score criteria enable DAO-CP to automatically choose the best strategy between split and refinement processes depending on the degree of changes. Fig 3 illustrates the intuition of the two processes, and Table 2 shows the criterion for each process. Split process. What if an incoming tensor slice has an entirely different theme compared to previous tensors? In this case, reusing the prior results of decomposition will cause a substantial loss of accuracy. To address the problem, we design a split process which divides the streaming tensor into separate tensors of different themes, using a threshold L s . Despite extra costs of space and time due to re-initialization, the split process enables DAO-CP to successfully avoid the unexpected accuracy drop (lines 8-13 in Algorithm 2). Refinement process. The refinement process is used to update the decomposition result when there is only a modest difference in the theme from the previous tensor. We use the hyperparameter L r to fine-tune the refinement criterion on whether an incoming tensor slice is similar to the previous tensors or not. We use the memory rate 1 − ρ because we need to focus more on the new slice (note that 1 − ρ ⩽ ρ since ρ 2 [0.5, 1]). The ALS operation also takes the z-score and is performed extra more times accordingly, because a higher z-score implies that there is a more abrupt change in data. As a result, these techniques determine how much information from the previous decomposition should be retained, balancing the tradeoff between accuracy and running time (lines 14-16 in Algorithm 2). The full computation of DAO-CP is outlined in Algorithm 2.

Algorithm 2: Data-Adaptive Online CP Decomposition (DAO-CP)
Inpit: Tensor stream X stream , memory rate ρ, and number of ALS iterations n iter Output: Decomposition factor set S ¼ f〚A 1  When the z-score exceeds the split threshold L s (e.g., change from theme A to B), DAO-CP re-initializes the new tensor slice using the static CP decomposition. When the z-score is between L r and L s (e.g, change from theme B to B 0 ), the refinement process determines how much information from the previous factors should be retained. Consequently, DAO-CP provides both fast and accurate decomposition for tensor streams even with inconsistent temporal patterns.
https://doi.org/10.1371/journal.pone.0267091.g003 Table 2. Execution criteria for split and refinement processes. L s is the threshold of splitting and initializing the decomposition, and L r is the threshold of refining the previous decomposition. By changing the two hyperparameters, we can fine-tune the re-decomposition process to balance the trade-off between accuracy and speed.

Process Criterion
Split Store the previous factors to S 10 Initialize 〚Ã 1 ; � � � ;Ã N 〛 using CP decomposition of X new 11 Calculate error norm E local between X new and 〚Ã 1 ; � � � ;Ã N 〛 12 Initialize Welford with E local 13 continue Calculate error norm E local between X new and 〚A 1 , Store the previous factors to S

Theoretical analysis
We analyze the computational complexity of DAO-CP. The following symbols are used for the analysis: N (dimensionality), R (rank), I new 1 (time length of the new data slice), I old 1 (time length of the formerly stacked data), I i6 ¼1 (mode length of the non-temporal i-th mode), and n iter (number of ALS iterations). Table 3 summarizes the comparison of DAO-CP to existing tensor decomposition methods. We find that DAO-CP has the lowest arithmetic cost among the methods except Onli-neCP. Note that even though OnlineCP has lower complexity than our proposed method, it suffers from poor accuracy due to the lack of consideration on temporal change of data, which limits its usage in practice (see Fig 4). (12) is

Lemma 1. The time complexity of initializing the complementary matrices G and H by
Proof. Because the operandsÃ > k A k and A > k A k in Hadamard operations are R × R matrices, it takes O(R 2 � I k ) for multiplication ofÃ > k 2 R R�I k and A k 2 R I k �R . Thus, the total arithmetic complexity of computing the matrices G and H is given by O(R 2 � ∑ k 6 ¼ 1 I k ).  (14) is

DTD [15]
OðNRI new The computational cost of updating all the non-temporal factor matrices Þ by Lemma 4, which can be written as OðNRI new Combining the complexities of updating {A i } by Lemmas 2 and 3 gives the following arithmetic cost for a single ALS iteration: Thus, including the initialization cost from Lemma 1, we obtain the desired result.

Experiments
In this section, we experimentally evaluate DAO-CP to answer the following questions. Since Full-CP is not an online method, we evaluate its fitness whenever a new slice is added. Detecting the change points of theme, DAO-CP successfully increases the accuracy of decomposition, which is even higher than that of Full-CP. https://doi.org/10.1371/journal.pone.0267091.g004 • Q1. Reconstruction error. How accurately does DAO-CP decompose real-world tensor streams compared to existing methods?
• Q2. Time cost. How much does DAO-CP improve the speed of tensor stream decomposition compared to existing online methods?
• Q3. Effect of thresholds L r and L s . How do the different choices of L r and L s for re-decomposition criteria affect the performance of DAO-CP?
• Q4. Refinement and split processes. How does each of the re-decomposition processes affect the performance of DAO-CP on real-world datasets?
In the following, we describe the experimental settings and answer the questions with the experimental results.

Experimental settings
All the experiments are conducted in a workstation with a single CPU (Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz).
Datasets. We use four real-world tensor streams and a synthetic tensor stream summarized in Table 4, where the first mode of each tensor corresponds to the temporal mode. We construct a tensor stream by splitting an original tensor into slices along the time mode (first mode); e.g., we make 41 slices from Sample Video dataset, where each slice is of dimension (5,240,320,3).
• Sample Video dataset is a series of animation frames with RGB values. For this dataset, we expect that changes of theme occur when an object starts moving or a scene changes. • Airport Hall is a video recorded in an airport, initially used to verify OLSTEC [17,27]. We expect that a sudden change of theme occurs when a crowd of people surges toward the airport during flight departure or arrival time.
• Korea Air Quality dataset consists of daily air pollutant levels for various locations in Seoul, South Korea from Sep 1, 2018 to Sep 31, 2019. The themes may continuously change depending on weather environment.
• Synthetic is made of concatenated tensors, which is the summation T main + T theme + T noise of three tensors {T main , T theme , T noise }, each referring to a fN ð0; 100Þ, N ð0; 10Þ, N ð0; 1Þg normally distributed randomized tensor, respectively, of size (1000, 10,20,30). Note that we simulate the changes of theme using the tensor T theme .

Competitors.
We compare DAO-CP with existing dynamic tensor decomposition methods including OnlineCP [14] and DTD [15], as well as with the static CP decomposition method, Full-CP [20]. All the methods are implemented in Python3 using the TensorLy library.
Parameters. The parameters L s and L r of DAO-CP are set to the values listed in Table 4. The section "Effect of Thresholds L s and L r " is an exception, where we vary the two values to investigate the effect of different thresholds. We set the memory rate as ρ = 0.8 for all the experiments.
Evaluation measure. To evaluate our proposed method, we use local and global error norms E local and E global , as well as the corresponding "fitness" scores F local and F global , which are defined as follows: F local denotes the fitness for an incoming data slice at each time step, while F global is the fitness for whole tensors. They are the normalized versions of error norms with respect to data size, designed to compare the decomposition accuracy for multiple datasets with different sizes. Running time. We evaluate the speed of each method in terms of local running time, which is the elapsed time for decomposing the current data slice. Because Full-CP is not an online algorithm, we assume that it decomposes the entire tensor whenever a new data slice comes in.

Reconstruction error
We compare DAO-CP to its competitors in terms of fitness, varying the decomposition rank in Fig 4. The average of local fitness is the mean of F local that is computed at every time step.
Note that DAO-CP shows higher fitness than the existing methods in most cases, regardless of ranks.

Running time
DAO-CP allows an accurate tensor factorization by exploiting the characteristic of data and detecting change points. However, this results in a slightly longer running time due to the redecomposition process. Fig 5 shows the running times of DAO-CP and other methods for various ranks. Note that DAO-CP has moderate running times between the static and dynamic decomposition methods, showing promising speeds comparable to the other dynamic algorithms (DTD and OnlineCP) and significantly faster than the static method (Full-CP).

Effect of thresholds L r and L s
We change the values of L s and L r to investigate the effect of split and refinement processes. Table 5 shows the results, where the number of refinement or split points changes as L r and L s vary. Note that both the processes lead to more accurate decomposition with extra time costs, and among them the split process has bigger trade-offs because it requires re-initialization. As a result, the more parts the tensor is split into (smaller L s ), the more accurate decomposition DAO-CP yields with extra costs of time. More refinement processes (smaller L r ) also have a similar effect, although the trade-offs are relatively small. In contrast to the split process, it requires more memory to store intermediate data such as auxiliary matrices G and H. In a practical standpoint, these observations are very useful because one can benefit from the hyperparameter tuning when there is a particular importance in one of accuracy, speed, or memory usage.

Refinement and split processes
Recall that the split process is used to start a new decomposition when an entirely different theme is detected, while the refinement process is used when there is only a modest difference from the previous decomposition. Fig 1 validates the importance of these intuitions, showing that DAO-CP results in remarkable performance for the video datasets with different scenes and object movements. Since Full-CP is not an online method, we evaluate its fitness whenever a new slice is added. Note that DAO-CP results in a promising speed comparable to DTD and OnlineCP with much more accurate decomposition, and significantly faster than Full-CP.
https://doi.org/10.1371/journal.pone.0267091.g005 Table 5. Effect of thresholds L r and L s . The memory usage means the summation of byte allocation to store intermediate data to calculate next decomposition results (e.g. auxiliary matrices G and H). We use Korea Air Quality dataset with rank 20, and change L r and L s to investigate the effect of refinement and split processes. Note that the lower the thresholds is set, the more frequently the re-decomposition processes are executed. Thus, one can benefit from this observation when there is a particular importance in one of accuracy, speed, or memory usage depending on target tasks. To further investigate the effects of split and refinement processes, we consider the following question: for each time step (or data slice), how does the re-decomposition process affect the running time and local error norm? With Sample Video dataset, we compare the running time and local error norm of DAO-CP to its competitors in Fig 6. We observe that both split and refinement processes significantly reduces the local error norm with only a modest sacrifice of running time.

Conclusions
In this paper, we propose DAO-CP, an efficient algorithm for decomposing time-evolving tensors. DAO-CP automatically detects a change point of theme in tensor streams and decides whether to re-decompose the tensors or not. Experimental results show that the proposed DAO-CP outperforms the current state-of-the-art methods on both synthetic and real-world datasets. We also investigate the effect of hyperparameters of our proposed method and demonstrate the advantages of trading-off between accuracy, speed, and memory usage. Future . Each re-decomposition process (at split point) significantly reduces the local error norm with only a modest sacrifice of running time (e.g., vertical line connecting P prev , P next , Q prev , and Q next ). Note that DAO-CP runs slower than the other dynamic methods (OnlineCP and DTD) only when one of split or refinement processes is performed to increase the accuracy (horizontal line R: average running time of competitor methods).